The Labelling of Prominence in Swedish by Phonetically Experienced Transcribers
نویسندگان
چکیده
An IPA-based system has been agreed upon for labelling Swedish prosody. In the present study this system is evaluated by assessing the inter-transcriber reliability in prominence labelling of nine expert subjects. The study also explores the acoustic (F0) basis for observed variability in the assignment of focus accent, the highest prominence label. INTRODUCTION Recently, as large corpora of prosodically labelled speech are needed for quantitative computational modelling of speech, great efforts are being taken to develop transcription systems meeting high standards on reliability. Thus, before extensive use of a system is initiated, it must be evaluated. The TOBI (TOnes and Break Indices) system developed for transcribing English prosody has been evaluated in a number of studies eg. [1,2]. Reyelt [3] evaluated a number of variants of prosodic transcription for German within the VERBMOBIL project. For Swedish, an IPA-based system has been agreed upon for labelling prosody (prominence and boundary phenomena), the details of which have been described in [4]. We have used this system in two studies [5,6] comparing the labelling of boundaries and prominences in spoken Swedish made by phonetically experienced and non-experienced transcribers. In the present study, the scope has been widened. One purpose, which it shares with the former studies [5,6], is to evaluate the transcription system used for labelling. In particular we want to estimate the extent to which experienced phoneticians and speech researchers vary in their labelling of prominences when presented with samples of read and spontaneous Swedish. In addition, the study aims at exploring the acoustic basis, specifically F0-characteristics, for the variability in labelling that we predict will occur. In particular, we want to establish the extent to which the variability associated with the assignment of focus accent is explainable in terms of F0-cues. Beckman [7] reviews the research on acoustic correlates to perceived stress in English. Referring to study [8], Beckman [7, p 60-62] makes clear that the dependence of perceived stress on F0-cues is complex, and varies with the position of the word in the sentence. Further, Wells [9] concludes that F0-cues play an important role for perceived prominence in English, although various other cues contribute, too. Although F0 is not assumed to be the only cue to prominence in Swedish – Bruce [10] also mentions temporal correlates, and there are also data reported in [11] indicating temporal correlates – it is believed to be an important determiner of focus accent. Thus, relating perceived focus accent to F0events seems reasonable in the light of previous research [12] according to which focus accent is intimately tied to a F0-rise following a word accent F0-fall timed differently for words with acute and grave accent, respectively. EVALUATION OF THE TRANSCRIPTION SYSTEM Method The 9 subjects participating in the study are all phoneticians or speech researchers with wide experience in prosody from different sites in Sweden. All are native-born Swedes. The subjects transcribed two kinds of recorded speech material. One was an excerpt, 233 words long, from an authentic news cable read aloud. The other was a 252-word-long excerpt of spontaneous speech, a retelling of the story read aloud. Both recordings were made in a soundproof room and rendered by the same male Swedish speaker. Each expert was sent the recorded material and instructions for labelling prominence according to the IPA-based Swedish system. Following this, four levels of prominence were distinguished and labelled accordingly for each word in the material: no stress (unmarked), secondary stress ("), primary stress/accented (’) and focus acccent (’’). Subsequent analyses included coding the data (no stress=0; secondary stress=1; primary stress=2; focus accent=3) and statistical analyses to estimate reliability. Labelling data Table 1 shows the labelling of prominences by the nine experts in a sample of the read material. The words in the text are ordered vertically in the first column. The following nine columns contain the individual labellings of the transcribers and the tenth column the means of these labellings for each word. The data presented give a rough indication of the reliability of labelling. Table 1. Labelling by nine transcribers. 0=no stress, 1=secondary stress, 2=primary stress, 3=focus accent.
منابع مشابه
Prosodic Labelling and Acoustic Data
Data on the labelling of boundaries and prominences in read and spontaneous speech have been collected from ten non-expert and one expert transcribers and analyzed for their inter-subjective variability. The labellings are matched with acoustic data to explore the relevant cues used by the transcribers. INTRODUCTION Most work on prosody relies on some kind of labelling of the prosodic features ...
متن کاملVariations in the perceptual modelling of macro- prosodic organization of spoken Swedish: prominence and chunking
In a pilot investigation, aiming to develop new methodological insights into the study of perceptual modelling of the macro-prosodic organization of spoken Swedish, different aspects of the listeners’ variation were studied. Two listener groups, students at the beginner’s level and trained phoneticians, had to mark the most prominent words and the chunks they could hear in speech samples of spo...
متن کاملAutomatic labelling of German prosody
One limitation in prosody research is the lack of sufficient prosodically labelled speech data. In this paper, we present research on an automatic labelling system that is able to produce a phonological tonal labelling according to the ToBI like intonation model for German developed by Féry. The system is not totally dependent on the specific language and/or labelling system, as it uses corpus ...
متن کاملInter-transcriber reliability of toBI prosodic labeling
The goal of this study was to evaluate the reliability among transcribers of a standard prosodic labeling system under relatively optimal conditions of training, supervision, facilities, procedures, and extent of speaker familiarity. The ToBI (Tones and Break Indices) model for standard American English[7][1] was used in the study; break indices indicate the degree of junction between words, pi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995